AITopics | linguistic bias

Collaborating Authors

linguistic bias

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CLARITY: Contextual Linguistic Adaptation and Accent Retrieval for Dual-Bias Mitigation in Text-to-Speech Generation

Poon, Crystal Min Hui, Ng, Pai Chet, Miao, Xiaoxiao, Loh, Immanuel Jun Kai, Zhang, Bowen, Song, Haoyu, Mcloughlin, Ian

arXiv.org Artificial IntelligenceNov-17-2025

Instruction-guided text-to-speech (TTS) research has reached a maturity level where excellent speech generation quality is possible on demand, yet two coupled biases persist: accent bias, where models default to dominant phonetic patterns, and linguistic bias, where dialect-specific lexical and cultural cues are ignored. These biases are interdependent, as authentic accent generation requires both accent fidelity and localized text. We present Contextual Linguistic Adaptation and Retrieval for Inclusive TTS sYnthesis (CLARITY), a backbone-agnostic framework that addresses these biases through dual-signal optimization: (i) contextual linguistic adaptation that localizes input text to the target dialect, and (ii) retrieval-augmented accent prompting (RAAP) that supplies accent-consistent speech prompts. Across twelve English accents, CLARITY improves accent accuracy and fairness while maintaining strong perceptual quality.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.11104

Country:

Asia (1.00)
North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
(3 more...)

Add feedback

EqualizeIR: Mitigating Linguistic Biases in Retrieval Models

Cheng, Jiali, Amiri, Hadi

arXiv.org Artificial IntelligenceApr-11-2025

This study finds that existing information retrieval (IR) models show significant biases based on the linguistic complexity of input queries, performing well on linguistically simpler (or more complex) queries while underperforming on linguistically more complex (or simpler) queries. To address this issue, we propose EqualizeIR, a framework to mitigate linguistic biases in IR models. EqualizeIR uses a linguistically biased weak learner to capture linguistic biases in IR datasets and then trains a robust model by regularizing and refining its predictions using the biased weak learner. This approach effectively prevents the robust model from overfitting to specific linguistic patterns in data. We propose four approaches for developing linguistically-biased models. Extensive experiments on several datasets show that our method reduces performance disparities across linguistically simple and complex queries, while improving overall retrieval performance.

information retrieval, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2504.07115

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.38)

Add feedback

Modifying AI, Enhancing Essays: How Active Engagement with Generative AI Boosts Writing Quality

Yang, Kaixun, Raković, Mladen, Liang, Zhiping, Yan, Lixiang, Zeng, Zijie, Fan, Yizhou, Gašević, Dragan, Chen, Guanliang

arXiv.org Artificial IntelligenceDec-10-2024

Students are increasingly relying on Generative AI (GAI) to support their writing-a key pedagogical practice in education. In GAI-assisted writing, students can delegate core cognitive tasks (e.g., generating ideas and turning them into sentences) to GAI while still producing high-quality essays. This creates new challenges for teachers in assessing and supporting student learning, as they often lack insight into whether students are engaging in meaningful cognitive processes during writing or how much of the essay's quality can be attributed to those processes. This study aimed to help teachers better assess and support student learning in GAI-assisted writing by examining how different writing behaviors, especially those indicative of meaningful learning versus those that are not, impact essay quality. Using a dataset of 1,445 GAI-assisted writing sessions, we applied the cutting-edge method, X-Learner, to quantify the causal impact of three GAI-assisted writing behavioral patterns (i.e., seeking suggestions but not accepting them, seeking suggestions and accepting them as they are, and seeking suggestions and accepting them with modification) on four measures of essay quality (i.e., lexical sophistication, syntactic complexity, text cohesion, and linguistic bias). Our analysis showed that writers who frequently modified GAI-generated text-suggesting active engagement in higher-order cognitive processes-consistently improved the quality of their essays in terms of lexical sophistication, syntactic complexity, and text cohesion. In contrast, those who often accepted GAI-generated text without changes, primarily engaging in lower-order processes, saw a decrease in essay quality. Additionally, while human writers tend to introduce linguistic bias when writing independently, incorporating GAI-generated text-even without modification-can help mitigate this bias.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.072

Country:

Oceania > Australia > Victoria > Melbourne (0.05)
South America > Uruguay > Maldonado > Maldonado (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.62)

Add feedback

Linguistic bias in ChatGPT: Language models reinforce dialect discrimination

AIHubSep-30-2024, 09:00:00 GMT

ChatGPT does amazingly well at communicating with people in English. Only 15% of ChatGPT users are from the US, where Standard American English is the default. But the model is also commonly used in countries and communities where people speak other varieties of English. Over 1 billion people around the world speak varieties such as Indian English, Nigerian English, Irish English, and African-American English. Speakers of these non-"standard" varieties often face discrimination in the real world.

chatgpt, discrimination, gpt-3, (12 more...)

AIHub

Country: North America > United States (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions

Liu, Yifan, Li, Yike, Wang, Dong

arXiv.org Artificial IntelligenceAug-27-2024

Media bias significantly shapes public perception by reinforcing stereotypes and exacerbating societal divisions. Prior research has often focused on isolated media bias dimensions such as \textit{political bias} or \textit{racial bias}, neglecting the complex interrelationships among various bias dimensions across different topic domains. Moreover, we observe that models trained on existing media bias benchmarks fail to generalize effectively on recent social media posts, particularly in certain bias identification tasks. This shortfall primarily arises because these benchmarks do not adequately reflect the rapidly evolving nature of social media content, which is characterized by shifting user behaviors and emerging trends. In response to these limitations, our research introduces a novel dataset collected from YouTube and Reddit over the past five years. Our dataset includes automated annotations for YouTube content across a broad spectrum of bias dimensions, such as gender, racial, and political biases, as well as hate speech, among others. It spans diverse domains including politics, sports, healthcare, education, and entertainment, reflecting the complex interplay of biases across different societal sectors. Through comprehensive statistical analysis, we identify significant differences in bias expression patterns and intra-domain bias correlations across these domains. By utilizing our understanding of the correlations among various bias dimensions, we lay the groundwork for creating advanced systems capable of detecting multiple biases simultaneously. Overall, our dataset advances the field of media bias identification, contributing to the development of tools that promote fairer media consumption. The comprehensive awareness of existing media bias fosters more ethical journalism, promotes cultural sensitivity, and supports a more informed and equitable public discourse.

bias dimension, dimension, media bias, (16 more...)

arXiv.org Artificial Intelligence

2408.15406

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Government (0.93)
Media > News (0.87)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Towards Bridging the Digital Language Divide

Bella, Gábor, Helm, Paula, Koch, Gertraud, Giunchiglia, Fausto

arXiv.org Artificial IntelligenceJul-25-2023

It is a well-known fact that current AI-based language technology -- language models, machine translation systems, multilingual dictionaries and corpora -- focuses on the world's 2-3% most widely spoken languages. Recent research efforts have attempted to expand the coverage of AI technology to `under-resourced languages.' The goal of our paper is to bring attention to a phenomenon that we call linguistic bias: multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages. Linguistic bias is manifested in uneven per-language performance even in the case of similar test conditions. We show that biased technology is often the result of research and development methodologies that do not do justice to the complexity of the languages being represented, and that can even become ethically problematic as they disregard valuable aspects of diversity as well as the needs of the language communities themselves. As our attempt at building diversity-aware language resources, we present a new initiative that aims at reducing linguistic bias through both technological design and methodology, based on an eye-level collaboration with local communities.

computational linguistic, diversity, linguistic bias, (15 more...)

arXiv.org Artificial Intelligence

2307.13405

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > UK North Sea (0.07)
Atlantic Ocean > North Atlantic Ocean > North Sea > UK North Sea (0.07)
(19 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

On the Amplification of Linguistic Bias through Unintentional Self-reinforcement Learning by Generative Language Models -- A Perspective

Lee, Minhyeok

arXiv.org Artificial IntelligenceJun-12-2023

Generative Language Models (GLMs) have the potential to significantly shape our linguistic landscape due to their expansive use in various digital applications. However, this widespread adoption might inadvertently trigger a self-reinforcement learning cycle that can amplify existing linguistic biases. This paper explores the possibility of such a phenomenon, where the initial biases in GLMs, reflected in their generated text, can feed into the learning material of subsequent models, thereby reinforcing and amplifying these biases. Moreover, the paper highlights how the pervasive nature of GLMs might influence the linguistic and cognitive development of future generations, as they may unconsciously learn and reproduce these biases. The implications of this potential self-reinforcement cycle extend beyond the models themselves, impacting human language and discourse. The advantages and disadvantages of this bias amplification are weighed, considering educational benefits and ease of future GLM learning against threats to linguistic diversity and dependence on initial GLMs. This paper underscores the need for rigorous research to understand and address these issues. It advocates for improved model transparency, bias-aware training techniques, development of methods to distinguish between human and GLM-generated text, and robust measures for fairness and bias evaluation in GLMs. The aim is to ensure the effective, safe, and equitable use of these powerful technologies, while preserving the richness and diversity of human language.

glm, linguistic bias, training data, (11 more...)

arXiv.org Artificial Intelligence

2306.07135

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.40)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)

Add feedback

Average Is Not Enough: Caveats of Multilingual Evaluation

Pikuliak, Matúš, Šimko, Marián

arXiv.org Artificial IntelligenceJan-3-2023

We believe that this to improvements of various multilingual technologies, is an often overlooked tool in our research toolkit such as machine translation (Arivazhagan that should be used more to ensure that we are et al., 2019), multilingual language models (Devlin able to properly interpret results from multilingual et al., 2019; Conneau and Lample, 2019), crosslingual evaluation and detect various linguistic biases and transfer learning (Pikuliak et al., 2021) or problems. In addition to this discussion, which language independent representations (Ruder et al., we consider a contribution in itself, we also propose 2019). It is now possible to create well-performing a visualization based on URIEL typological multilingual methods for many tasks. When dealing database (Littell et al., 2017) as an example of such with multilingual methods, we need to be able qualitative analysis, and we show that it is able to to evaluate how good they really are, i.e. how effective discover linguistic biases in published results.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2301.01269

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Regional Negative Bias in Word Embeddings Predicts Racial Animus--but only via Name Frequency

van Loon, Austin, Giorgi, Salvatore, Willer, Robb, Eichstaedt, Johannes

arXiv.org Artificial IntelligenceJan-20-2022

The word embedding association test (WEAT) is an important method for measuring linguistic biases against social groups such as ethnic minorities in large text corpora. It does so by comparing the semantic relatedness of words prototypical of the groups (e.g., names unique to those groups) and attribute words (e.g., 'pleasant' and 'unpleasant' words). We show that anti-black WEAT estimates from geo-tagged social media data at the level of metropolitan statistical areas strongly correlate with several measures of racial animus--even when controlling for sociodemographic covariates. However, we also show that every one of these correlations is explained by a third variable: the frequency of Black names in the underlying corpora relative to White names. This occurs because word embeddings tend to group positive (negative) words and frequent (rare) words together in the estimated semantic space. As the frequency of Black names on social media is strongly correlated with Black Americans' prevalence in the population, this results in spurious anti-Black WEAT estimates wherever few Black Americans live. This suggests that research using the WEAT to measure bias should consider term frequency, and also demonstrates the potential consequences of using black-box models like word embeddings to study human cognition and behavior.

frequency, msa, proportion, (14 more...)

arXiv.org Artificial Intelligence

2201.08451

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)

Add feedback

Free speech -- Linguistic bias in NLP

#artificialintelligenceOct-1-2021, 07:25:11 GMT

As a species, we've been conversing for half a million years. In the course of that time, we've learned not only to speak and to listen, but also extract the actual meaning behind what's been said. We evolved to grasp the tiniest emotional, cultural and contextual hints that are codified into our pronunciation, intonation and articulation. The acoustic elements of speech sounds from our unique voices, are just peanuts compared to the variation in our manners of speaking: our choice of words and jargon -- using elements such as jokes, figures of speech, telling stories, asking questions, apologising; everything that reflects our personalities. Human intelligence includes the unique skill of cooperation.

free speech, linguistic bias, nlp

#artificialintelligence

Industry: Law > Civil Rights & Constitutional Law (0.40)

Technology: Information Technology > Artificial Intelligence (0.41)

Add feedback